Detection of computer generated papers in scientific literature

نویسندگان

  • Cyril Labbé
  • François Portet
چکیده

Meaningless computer generated scientific texts can be used in several ways. For example, they have allowed Ike Antkare to become one of the most highly cited scientists of the modern world. Such fake publications are also appearing in real scientific conferences and, as a result, in the bibliographic services (Scopus, ISI-Web of Knowledge, Google Scholar,...). Recently, more than 120 papers have been withdrawn from subscription databases of two high-profile publishers, IEEE and Springer, because they were computer generated thanks to the SCIgen software. This software, based on a Probabilistic Context Free Grammar (PCFG), was designed to randomly generate computer science research papers. Together with PCFG, Markov Chains (MC) are the mains ways to generated Meaning-less texts. This paper presents the mains characteristic of texts generated by PCFG and MC. For the time being, PCFG generators are quite easy to spot by an automatic way, using intertextual distance combined with automatic clustering, because these generators are behaving like authors with specifics features such as a very low vocabulary richness and unusual sentence structures. This shows that quantitative tools are effective to characterize originality (or banality) of authors’ language. Cyril Labbé Univ. Grenoble Alpes, LIG, F-38000 Grenoble, France, e-mail: [email protected] CNRS, LIG, F-38000 Grenoble, France Dominique Labbé Univ. Grenoble Alpes, PACTE, F-38000 Grenoble, France, e-mail: [email protected] CNRS, PACTE, F-38000 Grenoble, France François Portet Univ. Grenoble Alpes, LIG, F-38000 Grenoble, France, e-mail: [email protected] CNRS, LIG, F-38000 Grenoble, France

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Engineering a Tool to Detect Automatically Generated Papers

In the last decade, a number of nonsense automatically-generated scientific papers have been published, most of them were produced using probabilistic context free grammar generators. Such papers may also appear in scientific social networks or in open archives and thus bias metrics computation. This shows that there is a need for an automatic detection process to discover and remove such nonse...

متن کامل

Performance Evaluation of Medical Image Retrieval Systems Based on a Systematic Review of the Current Literature

Background and Aim: Image, as a kind of information vehicle which can convey a large volume of information, is important especially in medicine field. Existence of different attributes of image features and various search algorithms in medical image retrieval systems and lack of an authority to evaluate the quality of retrieval systems, make a systematic review in medical image retrieval system...

متن کامل

What Is Resilience and How Can It Be Nurtured? A Systematic Review of Empirical Literature on Organizational Resilience

Background Recent health system shocks such as the Ebola outbreak of 2014–2016 and the global financial crisis of 2008 have generated global health interest in the concept of resilience. The concept is however not new, and has been applied to other sectors for a longer period of time. We conducted a review of empirical literature from both the health and other sectors to synthesize evidence on ...

متن کامل

Algorithmic Detection of Computer Generated Text

Computer generated academic papers have been used to expose a lack of thorough human review at several computer science conferences. We assess the problem of classifying such documents. After identifying and evaluating several quantifiable features of academic papers, we apply methods from machine learning to build a binary classifier. In tests with two hundred papers, the resulting classifier ...

متن کامل

A brief review of plagiarism in medical scientific research papers [RETRACTED]

[THIS ARTICLE IS RETRACTED] Plagiarism refers to “adopting someone else’s words, work or ideas and passing them off as one’s own”. It is potentially considered as the most prevalent form of scientific dishonesty discovered in research papers. The present review aims to provide a thorough account of plagiarism to build awareness about all dimensions of plagiarism.The key...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015